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Abstract 

Background: Greater access to evidence-based psychological treatments Is needed. This review aimed to evaluate whether 
internet-delivered psychological treatments for mood and anxiety disorders are efficacious, noninferior to established 
treatments, safe, and cost-effective for children, adolescents and adults. 

Methods: VJe searched the literature for studies published until March 2013. Randomized controlled trials (RCTs) were 
considered for the assessment of short-term efficacy and safety and were pooled in meta-analyses. Other designs were also 
considered for long-term effect and cost-effectiveness. Comparisons against established treatments were evaluated for 
noninferiority. Two reviewers independently assessed the relevant studies for risk of bias. The quality of the evidence was 
graded using an international grading system. 

Results: A total of 52 relevant RCTs were identified whereof 12 were excluded due to high risk of bias. Five cost- 
effectiveness studies were identified and three were excluded due to high risk of bias. The included trials mainly evaluated 
internet-delivered cognitive behavioral therapy (l-CBT) against a waiting list in adult volunteers and 88% were conducted in 
Sweden or Australia. One trial involved children. For adults, the quality of evidence was graded as moderate for the short- 
term efficacy of l-CBT vs. waiting list for mild/moderate depression (d = 0.83; 95% CI 0.59, 1.07) and social phobia (d = 0.85; 
95% CI 0.66, 1 .05), and moderate for no efficacy of internet-delivered attention bias modification vs. sham treatment for 
social phobia (d=— 0.04; 95% CI —0.24, 0.35). The quality of evidence was graded as low/very low for other disorders, 
interventions, children/adolescents, noninferiority, adverse events, and cost-effectiveness. 

Conclusions: l-CBT is a viable treatment option for adults with depression and some anxiety disorders who request this 
treatment modality. Important questions remain before broad implementation can be supported. Future research would 
benefit from prioritizing adapting treatments to children/adolescents and using noninferiority designs with established 
forms of treatment. 
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Introduction 

A pressing challenge for mental health services is meeting the 
demand for the treatment of depression and anxiety disorders. 
Nearly 40% of the population is estimated to be in need of 
treatment at some time during their life for anxiety or depression 
[1]. Each year 14—18% of the population across the age span suffer 
an anxiety disorder and 7-9% suffer from depression in the United 
States as well as in Europe [1,2]. Thus, meeting the needs of 
people suffering anxiety and depression with the current delivery 
methods is a gargantuan task [3-5]. 



Only one third of depressed patients respond fully to 
pharmacotherapy [6] and patients prefer psychological to 
pharmacologic treatment for depression and anxiety at a 3:1 rate 
[7]. Fortunately, cognitive behavioral treatments are helpful for 
anxiety and depression for adults [8-10] and for children and 
adolescents [11]. Other psychological therapies such as interper- 
sonal and psychodynamic therapies have also been reported to 
produce significant improvements [10,12,13]. However, limited 
access to qualified therapists restricts the utihty of psychological 
treatments. In fact, of those with a serious problem as many as 
50% in developed and 85% in undeveloped countries will simply 
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go untreated [14]. Of those who do receive treatment, rates of 
quality care are moderate to low for anxiety disorders [15]. 

The internet has offered a new avenue for providing psycho- 
logical treatments, but the eflFectiveness of these treatments is still 
an issue. Most reviews to date have found support for the use of 
internet-delivered cognitive-behavioral therapy (I-CBT) [16-18]. 
For example, a meta-analytical review found that I-CBT was 
helpful for four distinct disorders [19]. Similarly, Hedman et al. 
[20] reviewed randomized controlled trials (RCTs) of I-CBT and 
reported large effects for depression, social phobia and panic 
disorder. While ambitious, extant reviews nevertheless fail to 
address some key issues. 

First, the quality of the evidence needs to be carefully 
considered. In previous reviews, when used at all, quality 
assessments were restricted to a few indices of the internal validity 
of the individual studies. A proper assessment of risk of bias is 
essential to avoid the risk of drawing false conclusions, however, 
and it can be justified to exclude studies of higher risk of bias from 
the synthesis [21]. A recent example is a Cochrane review that 
found moderate clinical effect of exercise on depression when 
including all relevant trials regardless of risk of bias [22]. However, 
when restricting the analysis to the trials with low risk of bias, the 
estimate indicated only a small effect of exercise that did not reach 
statistical significance. 

Furthermore, investigators that conduct systematic reviews and 
meta-analyses are increasingly aware that not only individual 
studies but also the body of evidence needs to be systematically 
evaluated, because the confidence in the pooled effect estimates 
may be compromised not only by risk of bias in individual studies 
but also by several other factors (e.g., imprecision, inconsistency, 
indirectness, and publication bias) [23]. The issue of quality 
assessment is compounded further if reviews are conducted by the 
trial authors themselves [20,24]. For example, the Cochrane 
Collaboration rerjuires an independent assessment of eligibility 
and risk of bias by a second author not involved in the study/ 
studies due to potential conflicts of interest [25]. Also, as experts in 
the content area under review they may have pre-formed opinions 
that can influence their assessments [26]. Given that the extant 
reviews were conducted by the trial authors themselves, the field 
would gain additional [Tedibility from an independent evaluation. 

Second, the issue of noninferiority has been largely ignored in 
previous reviews, but is necessary when comparing an existing 
evidence-based treatment (e.g., CBT) with a new one (e.g., I- 
CBT). In contrast to investigations of psychological therapy that 
involve new methods in areas where there is no known evidence- 
based treatment, the internet programs wisely employ known 
treatment techniques; only the manner of treatment delivery is 
altered. A greater reach and eventual cost savings could make 
internet therapies viable alternatives in healthcare. A critical issue, 
then, is whether they are noninferior to existing treatment. 
Noninferiority trials have gained increased attention to help in 
clinical decision making as the list of possible treatments grows, 
since a new treatment should be at least not inferior to existing 
evidence-based ones [27]. The methodology for noninferiority 
trials differ from superiority trials [27] and there is a need to 
review the literature from this perspective. Previous reviews on 
internet-delivered treatments generally conclude that these treat- 
ments have effects equivalent to the estabhshed forms of 
treatments [18-20,28]. However, the absence of a significant 
difference between two treatments in a clinical trial is not the same 
as a proof of noninferiority. Furthermore, formal indirect 
comparisons of treatment effect estimates between trials are only 
appropriate if the new and established treatments were compared 
against a reference that is similar both in methods and population 



[29], which, in this case seems to be a indeterminate presumption. 
We therefore believe that the field is ripe for an analysis that 
elaborates on the issue of noninferiority vs. superiority. 

Noninferiority trials are difficult to design and execute well [27]. 
Circumstances that strengthen inferences about superiority, 
because they increase similarities across treatment arms, can have 
the reverse effect on inferences of noninferiority. If a novel 
treatment is in fact inferior to established treatments, a trial with a 
sloppy design will be biased against finding this difference [27]. 
Superiority trials mainly use intention-to-treat (ITT) samples 
whereas noninferiority should be demonstrated also in the per- 
protocol analysis because an ITT analysis tends to dilute 
differences. Furthermore, there should be a fairness of compar- 
isons between the new and established treatment, such that the 
established treatment is implemented rigorously under conditions 
that do not compromise tlu; assay scnsitivit)'. For example, if many 
subjects in a trial have previously failed to respond to the control 
treatment, there would be a bias in favor of the new treatment 
[30]. Noninferiority trials could also provide data for whether 
internet therapies are cost-effective, with important implications 
for healthcare. 

Third, the previous reviews have largely ignored potential 
adverse events (e.g., harms, side effects, and deterioration), which 
may prove important for implementation of remotely delivered 
psychological treatments. Finally, reviews to date have focused on 
CBT, while trials of other treatments have begun to emerge [31]. 

The current review addresses all of the above issues. It has been 
conducted under the auspices of the Swedish Council on Health 
Technology Assessment (SBU), a government agency that has 
produced numerous systematic reviews evaluating the effects of 
various treatments (www.sbu.se/en/). The overall aim of this 
report is to provide a systematic review of the literature evaluating 
internet-delivered psychological treatment for mood and anxiety 
disorders with attention to methodological quality, consideration 
of the noninferiority perspective, and with ratings of the quality of 
the evidence using Grading of Recommendations Assessment, 
Development and Evaluation (GRADE) [23] by a freestanding 
council. Specifically, the following questions guided the review 
(additional rjuestions were addressed in the governmental report): 

1 . Is internet-delivered psychological treatment efficacious, safe 
and cost-effective for mood and anxiety disorders in children, 
adolescents and adults? 

2. Is internet-delivered treatment noninferior to established 
psychological treatments? 

Methods 

Protocol and registration 

This systematic review was conducted at SBU. The inclusion 

criteria were pre-specified and a protocol was registered in 
advance internally at SBU (ref no UTV20 12/26), see Protocol 
SI. 

Eligibility criteria 

Only published studies in English were considered for this 
review. The criteria for eligibility included the following charac- 
teristics. 

Patients. Children, adolescents and adults with anxiety or 
mood disorders according to the manuals of the American 

Psychiatric Association [32] and the World Health Organization 
[33]. The specific diagnoses included were major depressive 
disorder, dysthymia, bipolar disorder, social phobia, panic 
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disorder, generalized anxiety disorder (GAD), posttraumatic stress 
disorder (PTSD), obsessive-compulsive disorder (OCD), specific 
phobia, and separation anxiety (in children and adolescents). 
Studies were excluded if the participants were selected primarily 
because of a specific physical illness. 

Interventions. Internet-delivered psychological treatments, 
defined as interventions based on an explicit psychological theory, 
not conducted at a clinic, and delivered to the patients via the 
internet. Any support had to be remotely delivered (e.g. email-like 
messages or telephone). The degree of support was categorized 
into pure self-help (no support), technician-assisted (e.g., non- 
clinical), or therapist-guided (i.e., clinical support). 

Comparator. Any established psychological treatments, wait- 
ing list, usual care, or attention control. 

Outcome. Change in symptoms of the primary disorder, 
adverse events, and cost per effect and per quality-adjusted life- 
years. 

Study design. For short-term effects and risk of adverse 
events only RCTs were included. For long-term follow-up 

assessments (i.e., s6 months after post-assessment) RCTs and 
observational studies were included because of the ethical and 
practical dilemmas of conducting long-term RCTs. For cost- 
effectiveness data, economic evaluations based on individual-level 
data and decision models were eligible. 

Information sources 

Electronic searches were conducted using Medical Subject 
Headings (MeSH) and relevant text word terms. The databases 
used were PubMed, Cochrane Library, CINAHL, PsycINFO, 
Psychology and Behavioral Sciences Collection (PBSC), TRIP 
database and CRD, up to March 4, 2013. 

Search strategy 

We used search terms for depression/mood and anxiety and for 
each disorder (e.g., panic, phobia), for a range of delivery methods 
(e.g., online, internet, web, computer, phone), and for therapy, 
psychotherapy, intervention, and terms for specific interventions 
(e.g., cognitive behavioral, psychodynamic, interpersonal). The 
detailed search strategies are found in Appendix S 1 . 

Study selection 

Two reviewers independently screened the tides and abstracts 
identified by the search strategy. All studies of potential relevance 
according to the inclusion criteria were obtained in full text and 
two reviewers independently assessed them for inclusion. Any 
disagreements were resolved by discussions. Reference lists were 
screened for additional studies of relevance. Appendix S2 lists the 
efficacy and cost-effectiveness reports that were excluded after full- 
text reading. 

Data collection process 

From each included study of moderate or low risk of bias (see 
below), data was extracted and inserted in a table by one reviewer. 
A second reviewer audited the data extraction. Any disagreements 
were resolved by discussion. 

Data items 

Information was extracted from included trials on (1) partici- 
pants (age, education, diagnosis, and method of diagnostic 
assessment); (2) treatment (including treatment paradigm, level of 
support, duration); (3) type of comparator (4) outcome measures of 
core symptoms; (5) adverse events or deterioration; (6) costs. 



Risk of bias in individual studies 

Two reviewers independently assessed the risk of bias with the 
use of checklists developed for each relevant study design [34]. 
Risk of bias is the systematic tendency that any aspect of the study 
makes the estimated treatment effect deviate from its true value, 
that is, the extent to which results of an included trial can be 
believed. The checklist for RCTs used hereinis highly similar to 
the Cochrane Collaboration's tool for assessing risk of bias [26] 
and includes 3 1 items to consider for the randomization (methods 
and outcome; 3 item.s), treatment (blinding, compliance, therapists, 
confounds; 5 items) and assessment (blinding, reliability, validity, 
timing, analysis; 9 items) of the participants, dropout (size, balance, 
covariates, analysis; 5 items), reporting bias (protocol, primary/ 
secondary outcome, adverse events, assessment, 6 items), and 
conflicts of interest (3 items). A rating of low, moderate or high risk 
of bias was given to each category of items and was combined into 
a global rating of the trial. 

Trials that had a serious flaw were rated as high risk of bias; 
trials that met all or nearly all criteria were rated as low risk of bias, 
such as trials with a convincing comparator (e.g., an established 
treatment or a sham versions of attention bias modification) and 
no other obvious risk of bias; the remainder were rated as 
moderate risk of bias. Trials of moderate risk vary in their 
strengths and weaknesses: some trials likely provide valid results 
while others are only possibly valid. A high-risk trial is not valid; 
the results are at least as likely to reflect flaws in the study design as 
true differences among the trial arms. A fatal flaw may be reflected 
by one aspect introducing a high risk of bias or by failure to meet 
combinations of item criteria. The reviewers agreed on rules-of- 
thumb for decisions on categories and alert attention to trials that 
had, for example, jV<30, dropout >20%, or unbalanced bas(;liiie 
characteristics. We included trials for the evaluation of long-term 
effects if they had a dropout rate of less than 30% and reported on 
other treatments during the follow-up period. For health-economic 
studies to be included they had to report both costs and effects. 
Any disagreements were resolved by consensus or by arbitration 
by a third reviewer. If necessary, study authors were solicited to 
provide additional information. Only studies with low or moderate 
risk of bias were used for further synthesis. 

Planned method of analysis 

We included as noninferiority designs all comparisons of 
internet-delivered treatment vs. established psychological therapies 
(e.g. I-CBT compared with individual therapist-led CBT). For 
these comparisons we used a predefined noninferiority margin of 
d= —0.2, chosen because it relates to a small effect size [3.5] and to 
ensure that noninferior treatments would retain an advantage over 
no treatment. All other comparisons were evaluated as superiority 
designs. Meta-analyses were carried out in RevMan 5. The 
calculations of the standardized mean differences were based on 
the groups' sample sizes, means and standard deviations at post- 
treatment. If the number of participants at post-treatment were 
not reported, the group sizes at randomization were used. 
Random effects models were used. AU effect sizes in this report 
refer to between-group effects. Costs were converted to USD and 
the 2013 price-level [36]. 

Publication bias 

Potential publication bias was assessed for plausibly effective 
interventions by inspecting funnel plots and by a trim-and-fiU 

procedure [37], which yields an estimate of the effect size after 
taking bias into account (analyses performed in Comprehensive 
Meta-Analysis v2, Biostat Inc.). 
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Quality of evidence (GRADE) 

The international graduig system GRADE [23] was used to 
assess the quahty of evidence for effects and safety with regard to 
groups of studies relevant to each treatment and support type, 
population, and disorder, according to the following four levels: 

• High quality (©® ®©) -We are very confident that the true 
effect lies close to that of the estimate of the effect. 

• Moderate quality (® ® ® O) -We are moderately confident in 
the effect estimate: The true effect is likely to be close to the 
estimate of the effect, but there is a possibility that it is 
substantially different. 

• Low quality (® ® O 0)-Our confidence in the effect estimate 
is limited: The true effect may be substantially different from 
the estimate of the effect. 

• Very low quality (@ O O O) - We have very little confidence 
in the effect estimate: The true effect is likely to be substantially 
different from the estimate of the effect. 

In the GRADE system, evidence based on RCTs begins as high 
quality evidence, but may be rated down for several reasons, 
including study limitations, inconsistency of results, indirectness of 
evidence, imprecision or reporting bias. That is, for each type of 
treatment and support type, for each disorder and population, the 
quality of the evidence was assumed to be high at the outset, but 
subsequently rated down if there were limitations in the relevant 
studies. For example, there were three trials of CBT with clinical 
support vs. waiting list for adult participants diagnosed with panic 
disorder. Evidence of treatment efficacy start as being of high 
quality because the trials were RCTs, while study limitations 
(waiting list comparison [WLC]), inconsistency in the results (two 
trials show favorable effect, one shows no effect), and imprecision 
across studies (all three trials have small samples) entailed that the 
body of evidence finally received a low-quality rating. The quality 
of evidence was decided upon through discussions among the 
authors and input from an external group, the Quality and Priority 
Group at the agency. In line with agency guidelines we rated down 
for indirectness when only one RCT was included for a specific 
question, unless the included RCT was a multi-center trial. 

Results 

We identified 52 relevant trials (54 reports), whereof 12 trials (13 
reports) were excluded due to high risk of bias. The efficacy data 
thus included 39 reports with 40 RCTs of low or moderate risk of 
bias and 2 additional reports of long-term follow-ups of these trials 
that were included in the synthesis (Figure 1). Most trials recruited 
volunteers via advertisements, evaluated variations of therapist- 
guided I-CBT in self-help format carried out over 8-12 weeks and 
used a WLC (Table 1). The support was delivered via phone or 
email-like messages and took approximately 10-20 minutes per 
participant and week. Diagnoses were made mainly by using the 
MINI neuropsychiatric interview or the Structured Clinical 
Interview for DSM Axis-I Disorders (SCID) and the screening 
was performed in person or via telephone. The majority of the 
trials (88%) were conducted by teams from Australia or Sweden. 

Mood disorders in adults 

Nine trials were identified: eight had moderate risk of bias 
[31,38^4] and one had high risk of bias (Appendix S2). The 
participants fulfilled criteria for a depressive episode, current or in 
partial remission, recurrent episodes, or dysthymia. No trials for 
bipolar disorder were found. Six trials included only participants 



with mild to moderate depression and six trials excluded 
participants who reported suicidal ideation. 

None of the trials assessed noninferiority. Five evaluated the 
effect of I-CBT vs. a WLC [41,42,44], WLC and weekly symptom 
ratings [39], or WLC and access to an onhne discussion group 
[40] . We found a large pooled effect for I-CBT as compared to a 
WLC (Figure 2). The quality of evidence was rated as moderate for 
therapist-guided I-CBT due to study limitations (WLC), see 
Table 2. 

Three other trials were included that evaluated one intervention 
each: one trial with an intervention that combined components 
from acceptance and commitment therapy, behavioral activation, 
and mindfulness [38]; one with internet-delivered psychodynamic 
therapy (I-PDT) [31], and one with therapist-led I-CBT delivered 
via a chat interface [43]. For each intervention, the quality of 
evidence was rated as very low. Four long-term foUow-up 
assessments (five reports) were assessed as having a high risk of 
bias [31,39,40,44,45]. 

Anxiety disorders in adults 

Social phobia. Sixteen trials were identified: three trials had 
low risk of bias [46-48] , 1 0 trials reported in 9 publications had 
moderate risk [49-57], and 3 trials had high risk. One 
noninferiority trial with low risk of bias found that therapist- 
guided I-CBT was superior to live group CBT, with an effect size 
of </= 0.41 on the LSAS (blinded) [47]. The 95% CI (0.03 to 0.78) 
was above our pre-defined noninferiority margin. We rated the 
quality of evidence as low because of imprecision (sample size) and 
indirectness (single trial). 

Eight trials with moderate risk of bias evaluated the effect of 
therapist-guided I-CBT compared to a WLC [49-56]. The 
treatments conferred a large effect compared to WLC (Figure 3). 
The quality of evidence for therapist-guided I-CBT was rated as 
moderate due to study limitations (WLC). One report also 
evaluated whether therapist-guided I-CBT was superior to 
bibliotherapy [56]. I-CBT was not found to be superior to 
bibliotherapy. The quality of evidence for guided I-CBT vs. 
bibliotherapy was rated as very low due to imprecision (small 
sample) and indirectness (single trial). 

Two trials (three reports) [56,58,59] included long-term follow- 
ups of the treatment groups were assessed as having moderate risk 
of bias. Their findings suggested that participants' improvements 
persisted after 30 months [58] and 1 and 5 years [56,59]. The 
quality of evidence was assessed as very low due to risk of bias and 
imprecision. One trial found that unguided I-CBT was not 
superior to a WLC [53]. The quality of evidence for unguided T 
CBT for social phobia was rated as very low due to study 
limitations (WLC), imprecision (small sample), and indirectness 
(single trial). 

Three trials, two of low [46,48] and one of moderate risk of bias 
[57], compared internet-delivered Attention Bias Modification (I- 
ABM) to an identical sham intervention. We found no clinically 
relevant pooled effect (Figure 4 includes one of three primary 
outcomes; plots were nearly identical for the Social Phobia Scale 
and the Liebowitz Social Anxiety Scale). The quality of evidence 
was rated as moderate for a lack of clinically meaningful effect of I- 
ABM (rated down due to imprecision, i.e., small sample). 

Panic disorder. Nine trials of I-CBT were identified. Five 
trials had moderate risk [60-64] and four had high risk of bias 
(e.g., due to differences among groups in baseline characteristics, 
sample sizes, dropout). One trial found no difference between 
therapist-guided I-CBT and live group CBT in participants 
recruited from a clinical population ((/=0.00) [60]. Noninferiority 
was not established as the 95% CI (—0.41 to 0.41) included our 



PLOS ONE I www.plosone.org 



4 



May 2014 | Volume 9 | Issue 5 | e98118 



Internet Treatment for Mood and Anxiety Disorders 



1713 records from the literature search 



1486 records excluded 



227 reports retrieved 
and read in Ml text 



173 reports excluded: 

Not original data 43 
Not RCT 7 

Other research questions 40 

Other type of intervention 35 
Other population 46 
Trial protocol 2 



52 trials (54 reports) assessed for risk of bias 



12 trials (13 reports) excluded 
due to high risk of bias 



40 trials (41 reports): 

37 trials (38 reports) with moderate risk of bias 
3 trials with low risk of bias 



Figure 1. Flowchart of included efficacy trials and additional reports of long-term follow-up assessments. 

doi:1 0.1 371/journal.pone.00981 1 S.gOOl 



predefined noninferiority margin of d= —0.20. One trial found no 
difierence between I-CBT and live individual CBT [63] . This trial 
was not designed as a noninferiority trial and the small sample 
limits the inferences to be made. The quality of evidence for the 
noninferiority of I-CBT vs. either individual or group CBT thus 
was rated as \try low due to study limitations (e.g., insufficient 
information about treatment integrity), imprecision (small sample), 
and indirectness (single trial). 

Three trials that compared therapist-guided I-CBT vs. a WLC 
with [64] or without [61,62] online information about panic found 
small to very large effects. No meta-analysis was undertaken 
because of the heterogeneity in outcome measures and effect sizes. 
We rated the rjuality of evidence as low because of study 
limitations (WLC, dropout) and imprecision (heterogeneous effect 
sizes, small samples). 

Generalized anxiety disorder. Four trials with moderate 
risk of bias were identified [65-68]. They evaluated therapist- 
guided I-CBT vs. a W'LC. The pooled effect was large although 
heterogeneous across the trials (Figure 5). We rated the quality of 
evidence as low for the short-term effect because of study 
limitations (WLC) and imprecision (heterogeneous effect sizes, 
small samples). 

One trial found that TCBT ^\ith non-clinical support by a 
technician was more effective than a WLC [67]. One trial 
included therapist-guided I-PDT [65] . As for the I-CBT condition 
in this trial no effect was found for I-PDT vs. WLC. The quality of 
evidence for I-CBT with non-clinical support and therapist-guided 
I-PDT was rated as very low because of study limitations (WLC, 



only one technician), imprecision (small sample), and indirectness 

(single trial). 

Specific phobia. We identified one trial of moderate risk of 
bias [69]. Four weeks of therapist-guided I-CBT did not 
outperform brief therapist-led exposure (one introductory session 
and one three-hour exposure session) according to a behavioral 
approach test in participants with spider phobia. The quality of 
evidence was rated as very low due to study limitations, 
imprecision (small sample), and indirectness (single trial). 

Posttraumatic stress disorder. Two relevant trials were 
identified; one trial with high risk of bias and one trial with 
moderate risk of bias that found that therapist-guided I-CBT was 
superior to WLC [70] . The quality of evidence was rated as very 
low due to study limitations (WLC), imprecision (small sample), 
and indirectness (single trial). 

Obsessive-compulsive disorder. We identified one trial of 
moderate risk of bias [71]. Therapist-guided I-CBT conferred a 
large effect compared to supportive therapy online. The quality of 
evidence was rated as very low due to study limitations (no credible 
active comparison condition), imprecision (small sample), and 
indirectness (single trial). 

Transdiagnostic interventions for anxiety and 
depression. Six trials of moderate risk of bias were identified 
that included participants with mixed anxiety disorders and/or 
MDD [72-77]. Five trials found tiiat therapist-guided I-CBT had 
moderate or large effects as compared to a WLC [72-74,76]. No 
meta-analysis was performed due to the heterogeneity in outcome 
measures, diagnoses, and treatment protocols. We rated the 
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Figure 2. Short-term efficacy of therapist-guided internet-based cognitive behavioral therapy (l-CBT) vs. waiting list for depression 
in adults. For the meta-analysis, the outcome chosen from each study was the Beck Depression Inventory I or II. 
doi:1 0.1 371 /journal.pone.00981 1 8.g002 



quality of evidence as low for these interventions because of study 
limitations (WLC) and imprecision (small samples, heterogeneous 
effects and interventions). One trial included a 1- and 2-year 
follow-up, with results suggesting that the improvements lasted 
throughout the follow up [74] . The quality of evidence was rated 
as very low due to study limitations (observational design, dropout) 
and imprecision (small sample). One trial recruited participants 
from an anxiety clinic and found no difference between unguided 
I-CBT and a WLC on the Patient Global Impression scale [75]. 
The quality of evidence was rated as very low due to study 
limitations (WLC, dropout), imprecision (small sample), and 
indirectness (single trial). 

Publication bias 

Funnel plots and Duval and Tweedie's trim-and-fiU procedure 
indicated no or trivial publication bias with respect to the pooled 
effect sizes for LCBT for adults with depression, social phobia, and 
GAD. 

Children and adolescents 

We found four trials and excluded three due to high risk of bias 
because of various shortcomings. One trial of moderate risk of bias 
evaluated LCBT for mixed anxiety disorders [78]: 30% of the 
completers did no longer fulfdl criteria for their primary anxiety 
diagnosis, compared to 10% in the WLC. The quality of evidence 
was rated as very low for the efficacy of internet-based 
psychological interventions for children and adolescents (Table 2). 

Risk of adverse events 

Eight trials provided information on intervention-associated 
risks for depression [31,38], social phobia [46,54], GAD [65], 
OCD [71], and transdiagnostic treatments [74,76]. The informa- 
tion provided was related to a worsening in symptoms and 
indicated that symptom worsening was present in 0-5% of treated 
participants and in 2-9% of participants in the comparison 
groups. The quahty of evidence was rated as very low for the risk 
of adverse events following internet-based psychological interven- 
tions for both children and adults (@ O O O). 

Cost-effectiveness 

Of the 139 studies screened for cost-effectiveness data, five trials 
met the eligibility criteria. Two had a moderate risk of bias [79,80] 
and three were excluded due to high risk of bias (e.g. incomplete 
information on costs; Appendix S2). One trial compared costs and 
effects between LCBT and treatment as usual while on waiting list 
among patients with depression [80] , and found that LCBT had a 
cost per QALY of 29,384 USD compared to treatment as usual. At 
a wiUingness-to-pay for a QALY of 50,000 USD the probability 



was approximately 70% that LCBT was cost-effective compared 
to treatment as usual. One trial compared costs and effects 
between LCBT and group CBT among patients with social 
phobia [79] . Compared to group CBT, LCBT was associated with 
a lower cost per patient of 1,422 USD and 19% greater 
improvement on LSAS at the six-month follow-up. At a 
willingness-to-pay per additionally improved patient of 3,000 
USD, the probability that LCBT was cost-effective compared to 
group CBT was approximately 90%. The calculations of QALYs 
had not taken the time aspect of the effect on quality of life into 
account and are not presented. 

Discussion 

In this review we assessed whether internet-delivered psycho- 
logical treatments for mood and anxiety disorders are efficacious, 
noninferior to established treatments, associated with risk of 
adverse events, and cost-effective. We found limited to moderate 
evidence that for adults who seek out this treatment, therapist- 
guided l-CBT has a favorable short-term effect compared to 
waiting list for social phobia, panic disorder, generalized anxiety 
disorder, or mild to moderate major depression. We were not able 
to draw conclusions about noninferiority to proven treatments, 
long-term effects, adverse events, cost-effectiveness, or efficacy 
when given to children and adolescents. 

Several reviews interpret the body of evidence such that l-CBT 
and established forms of CBT have comparable effects for mild to 
moderate depression and several anxiety disorders [19,20,81]. In 
contrast, we found insufficient evidence to conclude whether I- 
CBT is noninferior to face-to-face treatment. There are important 
aspects that need to be attended to with regard to the 
methodology, and ethics, of conducting trials with a placebo/no- 
treatment arm when there are existing evidence-based treatments 
[82,83]. These issues notwithstanding, we found few trials that 
compared l-CBT to a face-to-face treatment. These trials were 
generally not adequately designed to evaluate questions of 
noninferiority [27,84], with an exception of one trial, which 
provided tentative support for similar efficacy of therapist-guided 
l-CBT and group CBT for social phobia in adults [47]. 

There are a number of shortcomings with the existing trials that 
future studies would benefit from attending to. A common issue to 
these studies is that they were conducted by teams that developed 
the l-CBT program but had no role in developing the comparison 
face-to-face therapy. In addition, independent ratings of the 
quality of delivery of the therapy were not routinely included. 
Further, the face-to-face comparator was often group CBT and 
not individual CBT although the latter is generally the first-hand 
choice for anxiety and mood disorders [85,86]. The guidelines 
from the National Institute for Health and Care Excellence 
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Figure 3. Short-term efficacy of therapist-guided internet-based cognitive behavioral therapy (l-CBT) vs. waiting list for social 
phobia in adults. For the meta-analysis, the outcome chosen from each study was the Social Interaction Anxiety Scale. 
doi:1 0.1 371 /journal.pone.00981 1 8.g003 



(NICE) do not support the notion of equivalence between internet- 
delivered and face-to-face treatment for social phobia [86] , in part 
due to the aforementioned issues. More aptly designed trials are 
needed before we can answer clearly whether internet-delivered 
treatments are noninferior to face-to-face treatment. Furthermore, 
the lack of comparisons with established treatments provide scant 
data for cost-effectiveness analyses. Consequently, this review can 
provide no conclusions about the cost-effectiveness of I-CBT. 

The diverging conclusions among extant reviews about equal 
efficacy between internet-delivered and face-to-face treatments 
highlight critical methodological aspects that set the present review 
apart from previous reviews [19,81,87]. First, we used rigorous 
criteria for establishing noninferiority, whereas previous reviews 
seemingly have relied on subjective and indirect appraisal of the 
effect size differences. Second, we performed a systematic assessment 
of the body of evidence for each disorder [23] whereas previous 
reviews either used no formal assessment or relied only on the criteria 
stated by Chambless et al. [88], which indicate as evidence-based 
treatment any treatment that have been found superior to any 
comparison condition in two RCTs. The grading of the body of 
evidence that we used here entailed a reduced confidence in the 
results mainly due to the fact that studies were unbHnded, used 
subjective outcome measures, were designed with waiting list or 
similar comparison groups, and included relatively small samples. 

Third, we performed a comprehensive assessment of the risk of 
bias in the trials and excluded trials with high risk. Few trials for 
social phobia and depression were judged as having high risk of 
bias, which resulted in similar conclusions about short-term 
efficacy as the meta-analysis by Andrews et al. [19] Similarly, 
for PTSD, OCD, specific phobia, and transdiagnostic treatments 
only one trial (for PTSD) was excluded due to high risk. However, 
excluding high-risk studies resulted in fewer trials and a lower 



grading of the evidence for panic disorder than stated in previous 
reviews [19,89]. Of the four excluded pubhcations on panic 
disorder two pubhcations were from 2001, one from 2006, and 
one from 2008 (see Appendix S2). Given the technical progress in 
the field and that the reports represent studies planned and 
performed some years before pubhcation, at least the 2001 
publications are among the first in an emerging field and would 
have less resemblance of current and future practice of internet- 
delivered treatment packages. 

We found only four relevant trials for children and adolescents, 
and three had high risk of bias. The three excluded trials 
concerned social anxiety, OCD, and diverse anxiety disorders 
(mainly GAD), respectively. Including them would not alter our 
conclusions. This turnout seems to reflect the slow progress in 
general among psychological interventions for children and 
adolescents [90]. Although the low number of studies precludes 
quantitative meta-analysis, an equally important objective of a 
systematic review is to identify gaps in the literature. This could 
alert researchers and funding agencies to important research 
questions that are not given sufficient attention. The effect of 
internet-delivered interventions in general may be smaller among 
children [91], which stresses the need for more research 
specifically for this population. 

Finally, the trials included in this review may seem few in 
comparison to the expanding number of publications in the 
literature. However, we only included studies of participants with 
diagnosed mood and anxiety disorders. There are many other 
studies on internet-treatments in which participants have not been 
subjected to a diagnostic interview. Several of those trials used 
vmguided interventions, which may explain why so few trials of 
unguided interventions were included. Also, we did not pool 
studies across treatments and support types, or across disorders. 
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Figure 4. Short-term efficacy of internet-based attention bias modification (l-ABM) vs. sham treatment for social phobia in adults. 

For the meta-analysis, the outcome chosen from each study was the Social Interaction Anxiety Scale. 
doi:1 0.1 371/journal.pone.00981 1 8.g004 
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Figure 5. Short-term efficacy of internet-based cognitive behavioral therapy (l-CBT) vs. waiting list for generalized anxiety disorder 
in adults. For the meta-analysis, the outcome chosen from each study was the Penn State Worry Questionnaire. 
doi:1 0.1 371/journal.pone.00981 1 8.g005 



and tlierefore each cluster of studies yielded a modest number of 
trials despite an impressive amount overall. 

Remote delivery is one of several promising avenues for 
expanding the reach of psychological interventions [5] . Indeed, a 
key impetus in much of the reviewed research is to improve 
accessibility to CBT [63] and attract those normally too shy to seek 
treatment and those without access to CBT [92]. A central 
question, therefore, is whether internet-delivered treatment indeed 
attracts an underserved population. Among the trials of I-CBT for 
depression, 53-61% of participants had a history of psychological 
treatment [39,40,44]. Among the anxiety trials 16-66% of 
participants had previously received psychological treatment 
[49-52,54,56,71] and one-fourth had received CBT [63,71]. 
The data indicate that many depression trial participants already 
had access to treatment whereas this seemed to be less clear for 
anxiety disorders. The high level of educational attainment and 
employment rates among the participants raise concerns about 
whether the effects found in most RCTs can be generalized to 
those who today are underserved. Other questions also likely to be 
important to generalization concerns how these treatment 
programs can be implemented within the healthcare services 
and what type of changes that would be needed; for example, the 
training of existing therapists. Expanding the reach of psycholog- 
ical treatments is important [5]. We therefore concur with the 
NICE guidance [86] and hope for further research that attend to 
these issues in more detail. 

Several trials assessed long-term outcomes of the treatments. 
Yet, no clear conclusions could be drawn about long-term effects 
as these data were limited mainly due to the observational design, 
attrition, and the lack of data on participants' receipt of other 
treatments during the foUow-up period. Only eight efficacy trials 
reported on deterioration, and no trial suggested that adverse 
events in a broader sense had been monitored. There is clearly a 
need for better reporting of risk of safety data [93,94] . Currently, 
the risk of reporting bias precludes conclusions about the risk- 
benefit ratio of the treatments, which is an important aspect of 
comparing treatments. 

Correlational evidence suggests that therapist guidance is 
beneficial for the outcome [16,95,96]. Less extensive support 
without adequate oversight of the patients' mental health status 
could also compromise patient safety. We therefore emphasize that 
evidence was found only for therapist-gxiided treatments. The lack 
of efficacy of I-ABM (also seen in a trial published after our final 
search [97]) compared to the effects of ABIM in the laboratory 
[98], and of other remotely delivered therapies [99] further 
indicates the importance of attention to details about how 
interventions are delivered. 

We believe that using only trials with low or moderate risk of 
bias is an improvement to previous reviews. We are mindful of the 



fact that the ratings of risk of bias were subjective, which hampers 
the replicabUity of our findings. However, it is broadly recognized 
as poor review practice to disregard study quality altogether [26], 
for example, because of the impact of quality on effect estimates 
[28]. Instead of choosing a threshold approach, a quality- 
weighting approach can be used whereby low quality studies are 
included in the review and their influence is analyzed, thus 
avoiding selection bias. However, the assignment of quality 
weights is still fraught with subjectivity: unless rigorously imple- 
mented, it might increase the risk of over-inclusion bias and may 
result in inconsistency [100]. In addition, the use of simple scoring 
sheets for assessing bias is not recommended [26]. To minimize 
the uncertainty due to subjective judgments we performed the 
ratings according to best practice: The risk of conflicts of interests 
were minimized by the choice of independent reviewers and we 
used comprehensive score sheets developed for risk of bias ratings 
in individual trials and dual review; and we used the GRADE 
model for the overall assessment of the evidence [23]. Our ratings 
of the strength of evidence are related not only to specific 
treatment packages and comparison conditions (Table 2), but also 
to the particular population of adults seeking out treatment 
themselves. The majority of the included trials were conducted in 
Sweden or Australia, which greatly increases external validity 
within these countries; however, also warranting caution before 
extrapolating these findings into healthcare services in other 
countries and cultural settings. 

Conclusions 

l-CBT for adults with mild to moderate depression and select 
anxiety disorders may complement existing services. IVlore 
research is needed before conclusions can be drawn about the 
efficacy of internet-delivered treatment regarding other anxiety 
disorders, other treatment methods than CBT, the treatment of 
children, long-term effects, safety, cost-effectiveness, and noninfer- 
iority to proven forms of treatment. We believe that a shift is 
warranted from waiting list trials to using active comparators, 
particularly direct comparisons with established treatments. 
Nonetheless, more research is needed to understand what makes 
psychological treatments effective, and for whom. This field 
vmfolds rapidly, however, and it may not be long until remaining 
questions can be satisfactorily answered. 
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